Automatically Tuning Parallel and Parallelized Programs

机译：自动调整并行程序和并行程序

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In today’s multicore era, parallelization of serial code is essential in order to exploit the architectures performance potential. Parallelization, especially of legacy code, however, proves to be a challenge as manual efforts must either be directed towards algorithmic modifications or towards analysis of computationally intensive sections of code for the best possible parallel performance, both of which are difficult and time-consuming. Automatic parallelization uses sophisticated compile-time techniques in order to identify parallelism in serial programs, thus reducing the burden on the program developer. Similar sophistication is needed to improve the performance of hand-parallelized programs. A key difficulty is that optimizing compilers are generally unable to estimate the performance of an application or even a program section at compile time, and so the task of performance improvement invariably rests with the developer. Automatic tuning uses static analysis and runtime performance metrics to determine the best possible compile-time approach for optimal application performance. This paper describes an offline tuning approach that uses a source-to-source parallelizing compiler, Cetus, and a tuning framework to tune parallel application performance. The implementation uses an existing, generic tuning algorithm called Combined Elimination to study the effect of serializing parallelizable loops based on measured whole program execution time, and provides a combination of parallel loops as an outcome that ensures to equal or improve performance of the original program. We evaluated our algorithm on a suite of hand-parallelized C benchmarks from the SPEC OMP2001 and NAS Parallel benchmarks and provide two sets of results. The first ignores hand-parallelized loops and only tunes application performance based on Cetus-parallelized loops. The second set of results considers the tuning of additional parallelism in hand-parallelized code. We show that our implementation always performs near-equal or better than serial code while tuning only Cetus-parallelized loops and equal to or better than hand-parallelized code while tuning additional parallelism.

机译：在当今的多核时代，串行代码的并行化对于利用架构的性能潜力至关重要。然而，尤其是对遗留代码的并行化已证明是一个挑战，因为必须进行手动操作，要么针对算法修改，要么针对代码的计算密集型部分进行分析，以实现可能的最佳并行性能，这既困难又费时。自动并行化使用复杂的编译时技术来识别串行程序中的并行性，从而减轻了程序开发人员的负担。需要类似的技巧来提高手并行程序的性能。关键困难在于，优化编译器通常无法在编译时估计应用程序甚至程序段的性能，因此，性能改进的任务始终由开发人员承担。自动调整使用静态分析和运行时性能指标来确定最佳编译时间方法，以实现最佳应用程序性能。本文介绍了一种离线调整方法，该方法使用源到源并行化编译器Cetus和调整框架来调整并行应用程序性能。该实现使用一种称为合并消除的现有通用调整算法来研究基于测得的整个程序执行时间来串行化可并行化循环的效果，并提供并行循环的组合作为结果，以确保等于或改善原始程序的性能。我们根据来自SPEC OMP2001和NAS并行基准的一系列手动C基准对算法进行了评估，并提供了两组结果。第一个忽略手动并行循环，仅基于Cetus并行循环调整应用程序性能。第二组结果考虑了手工并行化代码中其他并行性的调整。我们表明，在仅调整Cetus并行化的循环时，我们的实现总是执行接近于或优于串行代码的结果，而在调整其他并行性时，则等于或优于手工并行化的代码。

著录项

作者
Dave, Chirag; Eigenmann, Rudolf;
展开▼
作者单位

展开▼
年度 2010
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Automatic tuning framework for parallelized programs [J] . Dariusz BURAK, Marcin RADZIEWICZ, Tomasz WIERCINSKI Pomiary Automatyka Kontrola . 2010,第12期

机译：并行程序的自动调整框架
2. PROGRAMMING, TUNING AND AUTOMATIC PARALLELIZATION OF IRREGULAR DIVIDE-AND-CONQUER APPLICATIONS IN DAMPVM/DAC [J] . Pawel Czarnul International Journal of High Performance Computing Applications . 2003,第1期

机译：DAMPVM / DAC中不规则分而治之应用的编程，调整和自动并行化
3. Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime [J] . Alcides Fonseca, Bruno Cabral, Joao Rafael, International journal of parallel programming . 2016,第6期

机译：自动并行化：在基于任务的并行运行时执行顺序程序
4. Automatically Tuning Parallel and Parallelized Programs [C] . Chirag Dave, Rudolf Eigenmann Languages and compilers for parallel computing . 2009

机译：自动调整并行程序和并行程序
5. Automatic program parallelization using stateless parallel processing architecture. [D] . Sun, Feijian. 2004

机译：使用无状态并行处理体系结构的自动程序并行化。
6. Parallels between Global Transcriptional Programs of Polarizing Caco-2 Intestinal Epithelial Cells In Vitro and Gene Expression Programs in Normal Colon and Colon Cancer [O] . Annika M. Sääf, Jennifer M. Halbleib, Xin Chen, 1888

机译：体外极化Caco-2肠上皮细胞的全球转录程序与正常结肠癌和结肠癌中的基因表达程序之间的平行性
7. Implementation of Parallel Code Generator under Static Execution Control and Proposal of Performance Tuning Tool for Automatic Parallelizing Translator for C Programs [O] . 近藤竜也, 甲斐宗徳 2017

机译：静态执行控制下并行代码生成器的实现以及C程序自动并行翻译器性能调整工具的建议

Automatically Tuning Parallel and Parallelized Programs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅